27 research outputs found
StyleGAN as a Utility-Preserving Face De-identification Method
Face de-identification methods have been proposed to preserve users' privacy
by obscuring their faces. These methods, however, can degrade the quality of
photos, and they usually do not preserve the utility of faces, i.e., their age,
gender, pose, and facial expression. Recently, GANs, such as StyleGAN, have
been proposed, which generate realistic, high-quality imaginary faces. In this
paper, we investigate the use of StyleGAN in generating de-identified faces
through style mixing. We examined this de-identification method for preserving
utility and privacy by implementing several face detection, verification, and
identification attacks and conducting a user study. The results from our
extensive experiments, human evaluation, and comparison with two
state-of-the-art methods, i.e., CIAGAN and DeepPrivacy, show that StyleGAN
performs on par or better than these methods, preserving users' privacy and
images' utility. In particular, the results of the machine learning-based
experiments show that StyleGAN0-4 preserves utility better than CIAGAN and
DeepPrivacy while preserving privacy at the same level. StyleGAN0-3 preserves
utility at the same level while providing more privacy. In this paper, for the
first time, we also performed a carefully designed user study to examine both
privacy and utility-preserving properties of StyleGAN0-3, 0-4, and 0-5, as well
as CIAGAN and DeepPrivacy from the human observers' perspectives. Our
statistical tests showed that participants tend to verify and identify
StyleGAN0-5 images more easily than DeepPrivacy images. All the methods but
StyleGAN0-5 had significantly lower identification rates than CIAGAN. Regarding
utility, as expected, StyleGAN0-5 performed significantly better in preserving
some attributes. Among all methods, on average, participants believe gender has
been preserved the most while naturalness has been preserved the least
Impact of Stricter Content Moderation on Parler's Users' Discourse
Social media platforms employ various content moderation techniques to remove
harmful, offensive, and hate speech content. The moderation level varies across
platforms; even over time, it can evolve in a platform. For example, Parler, a
fringe social media platform popular among conservative users, was known to
have the least restrictive moderation policies, claiming to have open
discussion spaces for their users. However, after linking the 2021 US Capitol
Riots and the activity of some groups on Parler, such as QAnon and Proud Boys,
on January 12, 2021, Parler was removed from the Apple and Google App Store and
suspended from Amazon Cloud hosting service. Parler would have to modify their
moderation policies to return to these online stores. After a month of
downtime, Parler was back online with a new set of user guidelines, which
reflected stricter content moderation, especially regarding the \emph{hate
speech} policy.
In this paper, we studied the moderation changes performed by Parler and
their effect on the toxicity of its content. We collected a large longitudinal
Parler dataset with 17M parleys from 432K active users from February 2021 to
January 2022, after its return to the Internet and App Store. To the best of
our knowledge, this is the first study investigating the effectiveness of
content moderation techniques using data-driven approaches and also the first
Parler dataset after its brief hiatus. Our quasi-experimental time series
analysis indicates that after the change in Parler's moderation, the severe
forms of toxicity (above a threshold of 0.5) immediately decreased and
sustained. In contrast, the trend did not change for less severe threats and
insults (a threshold between 0.5 - 0.7). Finally, we found an increase in the
factuality of the news sites being shared, as well as a decrease in the number
of conspiracy or pseudoscience sources being shared
Understanding the Bystander Effect on Toxic Twitter Conversations
In this study, we explore the power of group dynamics to shape the toxicity
of Twitter conversations. First, we examine how the presence of others in a
conversation can potentially diffuse Twitter users' responsibility to address a
toxic direct reply. Second, we examine whether the toxicity of the first direct
reply to a toxic tweet in conversations establishes the group norms for
subsequent replies. By doing so, we outline how bystanders and the tone of
initial responses to a toxic reply are explanatory factors which affect whether
others feel uninhibited to post their own abusive or derogatory replies. We
test this premise by analyzing a random sample of more than 156k tweets
belonging to ~9k conversations. Central to this work is the social
psychological research on the "bystander effect" documenting that the presence
of bystanders has the power to alter the dynamics of a social situation. If the
first direct reply reaffirms the divisive tone, other replies may follow suit.
We find evidence of a bystander effect, with our results showing that an
increased number of users participating in the conversation before receiving a
toxic tweet is negatively associated with the number of Twitter users who
responded to the toxic reply in a non-toxic way. We also find that the initial
responses to toxic tweets within conversations is of great importance. Posting
a toxic reply immediately after a toxic comment is negatively associated with
users posting non-toxic replies and Twitter conversations becoming increasingly
toxic
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude
The advanced capabilities of Large Language Models (LLMs) have made them
invaluable across various applications, from conversational agents and content
creation to data analysis, research, and innovation. However, their
effectiveness and accessibility also render them susceptible to abuse for
generating malicious content, including phishing attacks. This study explores
the potential of using four popular commercially available LLMs - ChatGPT (GPT
3.5 Turbo), GPT 4, Claude and Bard to generate functional phishing attacks
using a series of malicious prompts. We discover that these LLMs can generate
both phishing emails and websites that can convincingly imitate well-known
brands, and also deploy a range of evasive tactics for the latter to elude
detection mechanisms employed by anti-phishing systems. Notably, these attacks
can be generated using unmodified, or "vanilla," versions of these LLMs,
without requiring any prior adversarial exploits such as jailbreaking. As a
countermeasure, we build a BERT based automated detection tool that can be used
for the early detection of malicious prompts to prevent LLMs from generating
phishing content attaining an accuracy of 97\% for phishing website prompts,
and 94\% for phishing email prompts
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
User Engagement and the Toxicity of Tweets
Twitter is one of the most popular online micro-blogging and social
networking platforms. This platform allows individuals to freely express
opinions and interact with others regardless of geographic barriers. However,
with the good that online platforms offer, also comes the bad. Twitter and
other social networking platforms have created new spaces for incivility. With
the growing interest on the consequences of uncivil behavior online,
understanding how a toxic comment impacts online interactions is imperative. We
analyze a random sample of more than 85,300 Twitter conversations to examine
differences between toxic and non-toxic conversations and the relationship
between toxicity and user engagement. We find that toxic conversations, those
with at least one toxic tweet, are longer but have fewer individual users
contributing to the dialogue compared to the non-toxic conversations. However,
within toxic conversations, toxicity is positively associated with more
individual Twitter users participating in conversations. This suggests that
overall, more visible conversations are more likely to include toxic replies.
Additionally, we examine the sequencing of toxic tweets and its impact on
conversations. Toxic tweets often occur as the main tweet or as the first
reply, and lead to greater overall conversation toxicity. We also find a
relationship between the toxicity of the first reply to a toxic tweet and the
toxicity of the conversation, such that whether the first reply is toxic or
non-toxic sets the stage for the overall toxicity of the conversation,
following the idea that hate can beget hate
Exploring Gender-Based Toxic Speech on Twitter in Context of the #MeToo movement: A Mixed Methods Approach
The #MeToo movement has catalyzed widespread public discourse surrounding
sexual harassment and assault, empowering survivors to share their stories and
holding perpetrators accountable. While the movement has had a substantial and
largely positive influence, this study aims to examine the potential negative
consequences in the form of increased hostility against women and men on the
social media platform Twitter. By analyzing tweets shared between October 2017
and January 2020 by more than 47.1k individuals who had either disclosed their
own sexual abuse experiences on Twitter or engaged in discussions about the
movement, we identify the overall increase in gender-based hostility towards
both women and men since the start of the movement. We also monitor 16 pivotal
real-life events that shaped the #MeToo movement to identify how these events
may have amplified negative discussions targeting the opposite gender on
Twitter. Furthermore, we conduct a thematic content analysis of a subset of
gender-based hostile tweets, which helps us identify recurring themes and
underlying motivations driving the expressions of anger and resentment from
both men and women concerning the #MeToo movement. This study highlights the
need for a nuanced understanding of the impact of social movements on online
discourse and underscores the importance of addressing gender-based hostility
in the digital sphere